Pruning Decision Trees via Max-Heap Projection | Proceedings of the 2017 SIAM International Conference on Data Mining | Society for Industrial and Applied Mathematics

نویسندگان

  • Zhi Nie
  • Binbin Lin
  • Shuai Huang
  • Naren Ramakrishnan
  • Wei Fan
  • Jieping Ye
چکیده

The decision tree model has gained great popularity both in academia and industry due to its capability of learning highly non-linear decision boundaries, and at the same time, still preserving interpretability that usually translates into transparency of decision-making. However, it has been a longstanding challenge for learning robust decision tree models since the learning process is usually sensitive to data and many existing tree learning algorithms lead to overfitted tree structures due to the heuristic and greedy nature of these algorithms. Pruning is usually needed as an ad-hoc procedure to prune the tree structure, which is, however, not guided by a rigorous optimization formulation but by some intuitive statistical justification. Motivated by recent developments in sparse learning, in this paper, we propose a novel formulation that recognizes an interesting connection between decision tree post-pruning and sparse learning, where the tree structure can be embedded as constraints in the sparse learning framework via the use of a maxheap constraint as well as a sparsity constraint. This novel formulation leads to a non-convex optimization problem which can be solved by an iterative shrinkage algorithm in which the proximal operator can be solved by an efficient max-heap projection algorithm. A stability selection method is further proposed for enabling robust model selection in practice and guarantees the selected nodes preserve tree structure. Extensive experimental results demonstrate that our proposed method achieves better predictive performance than many existing benchmark methods across a wide range of real-world datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal Network Alignment | Proceedings of the 2017 SIAM International Conference on Data Mining | Society for Industrial and Applied Mathematics

A multimodal network encodes relationships between the same set of nodes in multiple settings, and network alignment is a powerful tool for transferring information and insight between a pair of networks. We propose a method for multimodal network alignment that computes a matrix which indicates the alignment, but produces the result as a lowrank factorization directly. We then propose new meth...

متن کامل

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network | Proceedings of the 2015 SIAM International Conference on Data Mining | Society for Industrial and Applied Mathematics

A number of real-world networks are heterogeneous information networks, which are composed of different types of nodes and links. Numerical prediction in heterogeneous information networks is a challenging but significant area because network based information for unlabeled objects is usually limited to make precise estimations. In this paper, we consider a graph regularized meta-path based tra...

متن کامل

Contextual Time Series Change Detection | Proceedings of the 2013 SIAM International Conference on Data Mining | Society for Industrial and Applied Mathematics

Time series data are common in a variety of fields ranging from economics to medicine and manufacturing. As a result, time series analysis and modeling has become an active research area in statistics and data mining. In this paper, we focus on a type of change we call contextual time series change (CTC) and propose a novel two-stage algorithm to address it. In contrast to traditional change de...

متن کامل

Concept Drift Detection with Hierarchical Hypothesis Testing | Proceedings of the 2017 SIAM International Conference on Data Mining | Society for Industrial and Applied Mathematics

When using statistical models (such as a classifier) in a streaming environment, there is often a need to detect and adapt to concept drifts to mitigate any deterioration in the model’s predictive performance over time. Unfortunately, the ability of popular concept drift approaches in detecting these drifts in the relationship of the response and predictor variable is often dependent on the dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017